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Abstract 

We propose an iterative algorithm for the minimization of a l\- 
norm penalized least squares functional, under additional linear con- 
straints. The algorithm is fully explicit: it uses only matrix multipli- 
cations with the three matrices present in the problem (in the linear 
constraint, in the data misfit part and in penalty term of the func- 
tional). None of the three matrices must be invertible. Convergence 
is proven in a finite-dimensional setting. We apply the algorithm to a 
synthetic problem in magneto-encephalography where it is used for the 
reconstruction of divergence- free current densities subject to a sparsity 
promoting penalty on the wavelet coefficients of the current densities. 
We discuss the effects of imposing zero divergence and of imposing 
joint sparsity (of the vector components of the current density) on the 
current density reconstruction. 

1 Introduction 

In magneto-encephalography (MEG) an image of an electrical current density 
is reconstructed from measurements of the magnetic field outside the scalp. 
The magnetic field generated by these currents is very weak compared to 
the environment; special precautions are taken to minimize these external 
effects on the observed data. Another characteristic of MEG imaging is the 
very low number of data: typically only a few hundred of measurements 
are taken. When using a current density representation of reasonable size 
(spatial resolution) these data are not complete. This implies having to 
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solve an underdetermined system of equations. Extra conditions need to be 
imposed to define a unique current density reconstruction. 

In [12] the use of a sparsity promoting penalty, together with an efficient 
representation of the current density in terms of wavelets [9j[T8] was proposed. 
In other words, the assumption that the unknown current density can be 
represented with a small number of non-zero wavelet coefficients, was used 
as a priori information to regularize the inversion. 

The regularization of the MEG inverse problem by a sparsity assumption 
was carried out in practice in [12] by adding an ^i-norm penalty term to a 
quadratic cost function for the data misfit. The £i-norm is a popular sparsity 
promoting penalty [5], H] as it allows for convex optimization techniques to 
be used instead of algorithms with combinatorial complexity. It was shown 
in [TU] that such a penalty regularizes the linear inverse problem, and con- 
vergence of an iterative soft-thresholding algorithm for the minimization of 
an ^i-norm penalized least squares functional was proven in a Hilbert space 
setting. In the present work we extend the method of [T21 EH] to incorpo- 
rate linear constraints (e.g. to impose zero divergence on the reconstructed 
current densities) and to handle a more general sparsity promoting penalty. 

In the first, mathematical, part of this paper we propose a new itera- 
tive algorithm for the minimization of an £i-norm penalized least squares 
functional, under additional linear constraints: 



Here K is the matrix that defines the linear relation between the unknown 
model x and the data y; Bx = b is a linear constraint on the solution. A is a 
matrix mixing the variables in the non-smooth penalty term It need 

not be invertible in our approach (e.g. A = grad would correspond to a total 
variation penalty [20]). We write the variational equations corresponding 
to this problem and derive a simple iterative algorithm. This algorithm 
consists of a single loop and each step in the loop is given explicitly in terms 
of matrix multiplications by K, A and B (and their transposes). We prove 
the convergence of this algorithm for general K, A and B (subject to a bound 
on their norms) in a finite-dimensional setting. In problem (pQ) the £i-norm 
penalty may be replaced by another convex function H(Ax). The proposed 
algorithm can be modified to apply to this well, as long the proximity 

operator of H is known. 

The proposed algorithm reduces to the generalized iterative soft-thresholding 
algorithm of [17] when the linear constraints Bx = b are removed. We will 
indicate below in which special cases (e.g. A = Id) our algorithm is also 
derivable by the method in [23]. In these special cases we also comment on 




x = arg min \\Kx — y\\ 2 + 2A||Ar||i. 



Bx=b 
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the conditions on the matrices that are necessary for guaranteeing conver- 
gence; in particular, we indicate where our conditions on the matrices K, A 
or B are less strict than the ones derivable from the work in [23] . 

In the second part of this paper, we apply the algorithm to an inverse 
problem loosely based on magneto-encephalography. We shall assume a linear 
relationship between an unknown current density J and a measured magnetic 
field B in the form of the Biot-Savart law. Furthermore we assume that the 
data are contaminated by Gaussian noise. As in [12] we will impose sparsity 
on the wavelet expansion of the current density J. That is, we will use 
the proposed iterative algorithm to solve problem ([T]) where x is the current 
density J, K a matrix corresponding to a discretized Biot-Savart law, A the 
wavelet transform, and Bx = b represents the linear constraints div(J) = 0. 
In this last point lies the main difference with the simulations in [T2] : we 
shall incorporate into the reconstruction procedure the assumption that the 
current density is divergence-free; this was not done in [12]. Our approach 
does not use divergence-free wavelets [HI [151 122], but relies on an explicit 
linear constraint for finding a divergence-free reconstruction. 

On a synthetic problem, we investigate the effect of the divergence-free 
nature of the current distribution on the sparse reconstruction. That is, we 
compare reconstructions from the same measurement data with and without 
the constraint. Secondly, we investigate the effect on the reconstruction of 
imposing a "joint sparsity" [13] condition on the two vector components of 
the current density. This means that at each position both vector components 
are simultaneously zero or simultaneously non-zero. Joint sparsity in MEG 
was first discussed in [12] as a way of improving reconstruction quality. The 
proposed iterative algorithm can also handle penalties that promote joint 
sparsity (see discussion at the end of Section E]). 

Besides MEG, other applications of ^-penalized least squares under linear 
constraints exist. One is found in the portfolio selection problem described 
in [3]. Such problems are often of a smaller size (fewer variables) and can 
sometimes also be solved via a non-iterative procedure. Another application 
of the minimization problem (|T|) is found in the formulation of a modified 
Total Variation model in the context of image processing tasks |16j . 

In this paper we will make frequent use of the (non- linear) soft-thresholding 
operator which is defined by: 




(2) 



3 



and of the projection on the ball of radius A defined by: 

„ / \ "nrA \z\ > A 

= w<a. < 3 > 

We have that: 

S x (z) + P x (z)=z (4) 
for all z. We set = {u with ||u||oo < A} (foe-ball of radius A). 



2 Description of the iterative algorithm 

The variational equations of the minimization problem (pQ) can be obtained 
by the introduction of Lagrange multipliers v: 

mm\\Kx — y\\ 2 + 2\\\Ax\\i — 2(v, Bx — b) and Bx = b. (5) 

X 

Derivation with respect to x yields: 

K T (Kx -y) + A T w - B T v = and Bx = b, 

where w is an element of the sub differential of A|| i.e. Wi = A (Ax)i/\(Ax)i 
if (Ax)i 7^ and \wi\ < A if (Ax)i = 0. This can be written more compactly 
as (Ax)i = S\(wi + (Ax)i) or equivalently = P\{wi + (Ax) A. We find that 
the variational equations corresponding to the problem ([T]) therefore are: 

K T (Kx -y) + A T w - B T v = 0, w = F x (w + Ax) and Bx = b, (6) 

where F\(u) corresponds to the application of P\ (defined in ( formula [3])) 
on each component of u. We assume that a solution to these equations exists 
and try to derive an iterative algorithm that converges to such a solution. 
By writing A||Ac||i = max\\ w \\ ao <x(w , Ax), the minimization problem ([1]) can 
expressed as: 

min max F(x,w,v), (7) 

x,Bx=bw£B™ 

where we have set: 

r [X, w, v) = \\Kx -y\\ 2 + 2(w,Ax) -2(v,Bx-b). (8) 
We write the variational equations as fixed-point equations: 
w = P a O + Ax) 

x = x + K T (y - Kx) + B T v - A T w (9) 
v = v — \ (Bx — b) 

1 



and study the predictor-corrector scheme: 

( ^n+l = v n _ ( Bx n _ 

x n+1 = x n + K T (y - Kx n ) + B T v n+1 - A T w n 

w n+l = f x ( w n + Ax n+1 ) (10) 
x n+l = x n + K T( y _ Rx nj + _ 

n+1 „.n 1 I 



V 1 = f 



There is a predictor-corrector step on the variables v and x but not on to. 
Clearly, the fixed-point of this iteration is a solution to the variational equa- 
tions (J6]). Moreover the algorithm is fully explicit: Each step only requires 
the application of the matrices K, A, B (and their transposes) and a simple 
projection (see formula There is no non-trivial sub-problem to solve 
in each step (such as e.g. solving a linear system of equations). In other 
words, there is no inner loop required for any of the lines in ffTOj) . In the next 
section we show that, under certain conditions on the operators A, B and K, 
and on the parameter a, the proposed algorithm fflUj) converges to a solution 
of the variational equations (EJ), and to a minimizer of the functional ([T]). 

For the special case when B = and 6 = (absence of linear constraints), 
the algorithm ( flUj) reduces to: 

x n+i = x n + x T {y - Kx n ) - A T w n 

w n+l = p A (V« + ^ n+1 ) (11) 
x n+l = x n + X T {y - Kx n ) - A T U7 n+1 . 

This algorithm was presented in [T7] to solve the problem: 

x = argmin \\Kx — y\\ 2 + 2A|| Ac||i. (12) 

X 

An important application is the total variation penalty in image analysis 
{A = grad). A similar algorithm (with predictor-corrector step on the w 
variable) was proposed independently in [2] for Poisson data. When A = Id, 
and using relation (j3J), algorithm ( II ip further simplifies to the traditional 
iterative soft-thresholding algorithm: 

x n+1 = § A (x n + K T {y - Kx n )) , (13) 

where §a(w) corresponds to the application of S\ (defined in formula ([2])) on 
each component of u. This algorithm was discussed in [10] for solving 

min \\Kx — y\\ 2 + 2A||a;||i. (14) 

X 

An accelerated version of this algorithm, the Fast Iterative Soft-Thresholding 
Algorithm (FISTA), was derived in pp. Many other algorithms exist as well. 
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On the other hand, when the constraints Bx = b are maintained, and 
with A equal to the identity, the algorithm ffTUl) reduces to a constrained 
version of the iterative soft-thresholding algorithm: 

v n+1 = v n - [Bx n - b) 

x^+i = s x (x n + K T (y - Kx n ) + B T v n+l ) (15) 
v n+i =v n_ l(Bx n+1 -b), 

for the problem 

x = arg min \\Kx — y\\ 2 + 2A||x||i. (16) 

Bx=b 

We will use algorithm (Tl5l) for an application in magneto-encephalography 
in section O Although it was not included in [23], algorithm (1 15]) for prob- 
lem (1161) . could also have been derived from the Bregman framework of [23] . 
At the end of section [3J we will comment on the difference between condi- 
tions of convergence that the matrices K and B have to satisfy to guarantee 
convergence of (fl5|) in our approach and in [23] . 

Finally, by setting K = and A = Id in problem ([T]) one recovers the 
so-called £j basis pursuit problem [B]: 

arg min ||x||i (17) 

Bx=b 

for which the algorithms (flOi) and (ITS"]) reduce to: 

^n+l = y n _ ^ Bx n _ 

x n+1 = § A (a; n + B T v n+1 ) (18) 
v n+1 = v n - l(Bx n+1 - b), 

This algorithm was discussed in [23] (using different notation and auxiliary 
variables) in a Bregman framework. 

Taking these special cases into account we can say that the proposed al- 
gorithm f TrOj) combines the generalized iterative soft-thresholding algorithm 
pip of [17] with the basis pursuit algorithm ( ITS]) into a single unified algo- 
rithm. 



3 Proof of convergence 

In this section, we prove the convergence of algorithm (ITU]) and show that 
this yields a minimum of functional ([T]). 

Lemma 1. If u + = F\(u~ + A), with Pa the projection on the convex set 
B™, then 

||w-w + || 2 < ||n-n"|| 2 - ||«- - u + \\ 2 - 2(u - u + , A) (19) 
for all u e Bf . 
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Proof. As Pa is the projection on a non-empty closed convex set one has: 

(u- P A («'),«'- Pa («')) <0 
for all it G 5^° an d an u '- Choosing u' = u~ + A and Pa(m') = u + yields 

(u - u + ,u" + A - u + ) < 0. 

Replacing (u — u + , u~ — u + ) by — 'u + || 2 + \\u~ — u + \\ 2 — \\u — u~ || 2 ) /2 
yields 

||u - u + \\ 2 + \\u~ - u + \\ 2 - ||« - u~|| 2 + 2(u - u + , A) < 

which is the desired result. □ 

The operator Pa and the convex set in Lemma [I] may be replaced 
with a projection on any non-empty closed convex set. In particular, when 
u + = u~ + A (i.e. B^ replaced by the whole space and Pa replaced by the 
identity), one has that: 

|| M - M +|| 2 = \\ u - u -\\ 2 - \\ u ~ - u + \\ 2 -2(u-u + ,A) (20) 

for all u. 

Lemma 2. If (x n+l , w n+1 , v n+1 ) and (x n , w n , v n ) are related by iteration (TT 
then 



n+1 || 2 + \\w - w n+1 \\ 2 + a\\v - v n+1 \\ 2 < 



\\x — x n \\ 2 + \\w — w n \\ 2 + a\\v — v n \\ 2 — \\x n — x n+1 || 2 
-\\w n - w n+1 || 2 - a\\v n - v n+1 \\ 2 - \\K(x - x n )\\ 2 
+ \\K(x n - x n+l )\\ 2 - \\B(x - x n )\\ 2 + \\B(x - x n+l )\\ 2 
+ \\B{x n - x n+1 )\\ 2 - \\A T {w - w n )\\ 2 
+ \\A T (w - w n+1 )\\ 2 + \\A T (w n+1 - w n )\\ 2 
+F(x, w n+ \ v n+1 ) - F(x n+1 , w, v) 
+2^(B(x - x n+1 ), Bx n+1 - b) 

for all x, v and all w £ . 

Proof. From Lemma [T] and equation (12"01 . we find: 

||x — x™ +1 || 2 + \\w — w n+1 \\ 2 + a\\v — f n+1 || 2 < ||x — x n \\ 2 + 1 1 it; — w n \\ 2 
+a\\v - v n \\ 2 - \\x n - x n+1 \\ 2 - \\w n - w n+1 \\ 2 - a\\v n - v n+1 \\ 2 
-2(x - x n+ \ K T (y - Kx n )) - 2(x - x n+ \ B T v n+1 - A T w n+1 ) 
-2(w - w n+ \ Ax n+1 ) + 2(v- v n+1 , Bx n+1 - b). 



(21) 
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As ((TDP implies that v n+1 = v n+l + B(x n+1 - x 11 ) + ^(Bx n+1 - b) and that 
x n+l = x n+1 + A T (w n+1 — w n ), this can be written as: 

||x — x™ +1 || 2 + \\w — w n+1 \\ 2 + a\\v — v n+1 \\ 2 < \\x — x n \\ 2 + 1 1 it; — w n \\ 2 
+a\\v - v n \\ 2 - \\x n - x n+1 \\ 2 - \\w n - w n+1 \\ 2 - a\\v n - v n+1 \\ 2 
-2(K(x - x n+1 ),y - Kx n ) + 2(A(x - x n+1 ), w n+l ) 
-2(B(x - x n+1 ), v n+1 ) - 2(B(x - x n+1 ), B(x n+1 - x n )) 
+2^{B(x - x n+l ),Bx n+1 - b) - 2{w - w n+1 , Ax n+l ) 
-2(A T (w - w n+1 ), A T (w n+1 - w n )) + 2(v - v n+1 , Bx n+1 - b). 

The terms in {v n+1 , Bx n+1 ) and (w n+1 , Ax n+1 ) drop and the remaining terms 
can be re-arranged to yield: 

\\x - x n+l \\ 2 + 1 1 w - w n+l \\ 2 + a\\v - v n+l \\ 2 < \\x - x n \\ 2 + \\w - w n \\ 2 
+a\\v - v n \\ 2 - \\x n - x n+1 \\ 2 - \\w n - w n+1 \\ 2 - a\\v n - f n+1 || 2 
-2(K(x - x n+1 ),y - Kx n ) - 2(A T (w - w n+1 ), A T (w n+1 - w n )} 
-2(B(x - x n+1 ), B(x n+1 - x n )) + 2^(B(x - x n+1 ), Bx n+1 - b) 
+2(w n+ \ Ax) - 2{w, Ax n+l ) - 2(u n+¥ , Bx-b) + 2(v, Bx n+1 - b). 

By rewriting the following inner products: 

-2{K(x - x n+1 ), y - Kx n ) = \\Kx - y\\ 2 - \\Kx n+1 - y\\ 2 

-\\K(x - x n )\\ 2 + \\K(x n - x n+1 )|| 2 

-2(A T (w - w n+1 ), A T (w n+1 - w n )) = \\A T (w-w n+1 )\\ 2 

+ \\A T (w n+1 - w n )\\ 2 - \\A T (w - w n )\\ 2 

-2(B(x - x n+1 ), B(x n+1 - x n )) = \\B(x - x n+1 )\\ 2 

+ \\B(x n+1 - x n )\\ 2 - \\B(x - x n )\\ 2 , 

and by using the expression (jHJ) of F(x, v, w) in: 

2(w n+1 , Ax) - 2(«;, Ax n+l ) - 2(v n+1 , Bx-b)+ 2(v, Bx n+l - b) = 

F(x, w n+ \ v n+1 ) - F(x n+1 ,w, v) -\\Kx- y\\ 2 + \\Kx n+1 -y\\ 2 , 

the previous inequality can be written as ()21|) . which proves the lemma. □ 

Lemma 3. If (x, w, v) satisfies the variational equations then 

F(x, w, v) — F(x,w,v) < -\\K(x -x)\\ 2 (22) 

for all x, v and all w G . 

Proof. To prove inequality (1221) . we use Lemma [2], where we replace both 
(x n , w n , v n ) and (x n+1 , w n+1 , v n+1 ) by (x, w, v); this is allowed because (x, w, v) 
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satisfies OH]) which are the fixed-point equations of algorithm ( TTUj) . Then re- 
lation ( T2~Tj) becomes: 

II x — xll 2 + \\w — w\\ 2 + a\\v — v\\ 2 <\\x — x\\ 2 + \\w — w\\ 2 + a\\v — v\\ 2 



-\\K(x-x)\\ 2 

- \\B(x-x)\\ 2 + \\B(x -x)\\ 2 

- \\A T (w-w)\\ 2 + \\A T (w-w)\\ 2 
+ F(x, w, v) — F(x, w, v), 

(23) 



for all x, v and all w G B™. This implies inequality (1221) . □ 

We now show that a solution of equations ([6]) solves the minimization 
problem (JTJ). 

Theorem 1. If (x,w,v) satisfies the variational equations then x is a 
solution of the minimization problem (Qp. 

Proof. If (x, w, v) is a solution of ([6]) it follows from Lemma|3]that F(x, w, v) < 
F(x, w, v) for all x, v and all w G B™, which means: 

\\Kx - y\\ 2 + 2(w, Ax) < \\Kx - y\\ 2 + 2(w, Ax) - 2(v, Bx-b). 
Taking the maximum over w G B^ in the left hand side gives 

|| J 0-y|| 2 + 2A||Ar||i < \\Kx - y\\ 2 + 2{w, Ax) -2(v,Bx-b) 
and since (w,Ax) < maxiuN <x(w, Ax) = \\Ax\\i one finds: 

\\Kx -y\\ 2 + 2\\\Ax\\i < \\Kx -y\\ 2 + 2\\\Ax\\i -2(v,Bx-b). 

for all x, v. As we minimize under the condition that Bx = b, we have that 
(v, Bx — b) = and find 

\\Kx - y\\ 2 + 2A||Ax||i < \\Kx - y\\ 2 + 2A||Ax||i, 

for all x for which Bx = b. As Bx, = b this proves the theorem. □ 

Theorem 2. If the set {x, with Bx = b} is non-empty, \\AA T \\ < 1, 
\\^K T K + B T B\\ < 1 and a > ~, then the sequence (x n ,w n ,v n ) ne ^ defined 
by the iteration 

' v n+x = v n - {Bx n - b) 

x n+l = x n + K T( y _ Rx n} + _ ^T^n 

w n+l = ]p x ( w n + Ax n+1 ) (24) 
x n+l = x n + K T^ y _ Rx n^ + B T^n+l _ 

v n+l = v n _ L(Bx n+1 -b) 
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converges to a solution (x^,w',v*) of the variational equations and a 
solution of the minimization problem 

Proof. If {x, with Bx = b} is not empty, there is a solution to ([I]), implying 
that there exists a solution (x, w, v) to the variational equations 0- We now 
use Lemma [2] with (x,w,v) = (x,w,v) and find: 

||x — x n+1 || 2 + \\w — w n+1 || 2 + a\\v — f n+1 || 2 < \\x — x n \\ 2 + \\w — w n \\ 2 
+a\\v-v n \\ 2 - ||x n -x n+1 || 2 - \\w n -w n+1 \\ 2 - a\\v n+1 - v n \\ 2 
-\\K(x - x n )\\ 2 + \\K(x n - x n+1 )\\ 2 - \\B(x - x n )\\ 2 + \\B(x - x n+1 )\\ 2 
+ \\B(x n - x n+1 )\\ 2 - \\A T (w - w n )\\ 2 + \\A T (w - w n+1 )\\ 2 
+ \\A T (w n+1 - w n )\\ 2 - \\K(x - x n+1 )\\ 2 + 2^(B(x n+1 - x), Bx n+l - b), 

where we also used relation (j22p with (x,w,v) = (x n+1 , w n+1 , v n+1 ). As 
Bx = b we now use that: 

2 ^—^(B(x n+1 -x),Bx n+1 ~b) = 2^—^\\Bx n+1 -b\\ 2 = 2a(l-a)\\v n+1 -v n \\ 2 , 
a a 

and: 

-\\K(x-x n )\\ 2 -\\K{x n+1 -x)\\ 2 = -\\\K(x n+1 -x 



\\\K(2x - x n+1 - x n )\\ 2 



2 

< -±\\K{x n+1 -x n )\\ 2 , 

and reorder to obtain: 

||x - x" +1 || 2 - \\B(x - x n+1 )\\ 2 + \\w - w n+1 \\ 2 - \\A T (w - w n+1 )\\ 2 
+a\\v - v n+1 \\ 2 < \\x - x n \\ 2 - \\B(x - x n )\\ 2 + \\w - w n \\ 2 
-\\A T (w - w n )\\ 2 + a\\v - v n \\ 2 

_(\\ x n+l _ x n\\2 _ ±\\K(x n+1 - X n )\\ 2 - \\B(x n - X™ +1 )|| 2 ) 

-M| W «+1 - W nf _ \\A T (w n+1 - w n )\\ 2 
-a(2a- l)\\v n+1 -v n \\ 2 . 

As we assume that ||t4v4 t || < 1 and \\\K T K + B T B\\ < 1 we can introduce 
regular square matrices L, U and V by L T L = 1 — \K 1 K — B T B, U T U = 
1 - B T B and V T V = 1 - AA T to find: 

\\U(x - x n+1 )\\ 2 + \\V(w - w n+l )\\ 2 + - ^ +1 || 2 

<||Z7(x - x")|| 2 + \\V(w - w n )\\ 2 + a||0 - v n \\ 2 

-\\L(x n -x n+1 )\\ 2 -\\v(w n -w n+1 )\\ 2 1 ' 

-a(2a- l)\\v n+1 -v n \\ 2 . 
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Summing from M to N > M one finds: 

\\U(x - x N+1 )\\ 2 + \\V(w - w N+1 )\\ 2 + a\\v - v N+1 \\ 2 



<\\U{x - x M )\\ 2 + \\V{w - w M )\\ 2 + a\\v - v M \\ 2 

N 

,n+lM|2 (26) 



- (} L ^ xn - ^ n+1 )H 2 + \\V(w n - w r 
+ a(2a - l)\\v n+l - v n \\ 2 y 



n=M 



Since at > h the summation on the right hand side is negative. As U and 
V are invertible, it follows that the sequence (x n ,w n ,v n ) is bounded. And, 
as we work in a finite dimensional space, there is a convergent subsequence 
(x n . , w n . , v n . ) (x\w\v^). It also follows from inequality f l26]) that: 



n+l _ n\\2 



N 

(\\L(x n - x n+1 )\\ 2 + \\V(w n - w n+1 )\\ 2 + a{2a - l)\\v 

n=M 

< \\U(x - x M )\\ 2 + \\V(w - w M )\\ 2 + a\\v - v M \\ 2 

As a > i, ||L(x n -x n+1 )|| 2 , ||V(w n - w n+1 )|| 2 and ||w n+1 - v n \\ 2 tend to zero 
for large n, which implies that \\x n — x n+1 \\ 2 and \\w n — w n+1 \\ 2 tend to zero 
as well. It follows that the subsequence (x n . + i,w n , + i,v n , + i) also converges 
to (x\ w',v^) and that (x>, w*, v^) satisfies the fixed-point equations Qj. We 
can therefore choose (x, w, v) = (x\ w\ v') in relation fl26l) to find: 



\\U(x* - x N+l )\\ 2 + \\V(w1 - w N+l )\\ 2 + a\\vi - ^ +1 || 2 

< \\U(x^ - x M )\\ 2 + \\V{w^ - w M )f + a\\rf - v M f 

for all N > M. As there is a convergent subsequence of (x n ,w n ,v n ), the 
right hand side of this expression can be made arbitrarily small by taking 
M = rij large enough. Hence the left hand side will be arbitrarily small for 
all N larger than this M. This proves convergence of the whole sequence 
(x n ,w n , v n ) to {x\ w^, ut). 

As (x',w',v') satisfies the fixed-point equations, it follows from Theo- 
rem [T] that x' is a solution to problem ([T]). □ 



11 



Discussion 



If \\\K T K + B T B\\ > 1 or ||>L4 T || > 1 one can rescale the matrices and 
the variables to arrive at the following iteration: 

' v n+1 = v n - {Bx n - b) 
x n+1 = x n + nK T (y - Kx n ) + r 3 B T v n+1 - r^w 11 
w n+i = Px ( w n + ^Ax n+1 ) (27) 

x n+l = x n + n KT(y - K X n ) + T 3 B T V n+1 - nA T W n+1 
v n+l = y n_ l(Bx n+1 - b) 

with step size parameters r 1; r 2 , r 3 > that satisfy \\tiK t K /2+r 3 B T B\\ < 
1 and r 2 ||AA T || < 1. 

The £i-norm in problem ([1]) does not necessarily have to be defined 
as ||u||i = Yli \ u i\- I n section [5] we will use an £i-norm of the form 
Hull! = J2f max (\ u i,i\y ■ ■ ■ A u i,m\) for a vector u G M Arxm . Such a 
penalty is useful for promoting joint sparsity on the lijj (for a fixed 
i). Indeed, if e.g. is non-zero, then all other Uij (j ^ 1) may be as 
large as ju^i] as well, without increasing maxdw^i], . . . , |iii, m |). 
As Amax(|zi|, . . . , \z m \) = max^ 1 <x{w, z), one needs to replace the 
projection F\ in (1101) by N projections on an .^-ball (in M m ) of radius 
A. We denote the projection on an ^-ball of radius A in R m by Q\. 
In algorithm (I15p . that is used for the special case A = Id, one has to 
replace the component-wise soft-thresholding S\ with a new threshold- 
ing function T\ — Id — Q\ (and apply it to N vectors of size m). The 
operator T\ can be computed as follows [TH]. Let z G M m and order 
the entries such that \z^ \ > \z i2 \ > ■ ■ ■ > \zi m \. Then: 

for ||z||i < A : T x (z) = 

{T x (z))i. = sgn(^.)(^ =1 \z ik \ - j = l,...,l 

(T x (z))i. = z i6 j = 1 + 1,.. 

(28) 

where I G {1, . . . , m} is the largest index satisfying \z^\ > (J2k=i \ z h I — 
A)//. 

The £i-norm in functional ([1]) can be replaced by a convex lower semi- 
continuous function H: 

x = argmin \\Kx - y\\ 2 + 2H(Ax). (29) 

Bx=b 

(assuming a minimizer exists). The projection operator Pa in algorithm 
fTTOj) then needs to be replaced by the proximity operator of the convex 



for \\z\\i > A : 
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conjugate of H, H*, denned by H*(w) = sup x {(u>, x) — H(x)} (see e.g. 

my- 

v n - (Bx n - b) 

x n + K T (y - Kx n ) + B T v n+l - A T w n 
prox H *(w n + Ax n+1 ) (30) 
x 11 + K T {y - Kx 11 ) + B T v n+1 - A T w n+1 
v 11 - l{Bx n+1 - 6), 

which converges under the same conditions as in Theorem [2] to a min- 
imizer of problem (|2U|) . The proximity operator of H is defined as 
prox H (u) = argminu, H(w) + \\w — u\\ 2 /2, and prox H » =Id — prox^. It 
is important to remark that only the proximity operator prox H of H is 
needed, not the proximity operator of H(A-). In fact the convergence of 
algorithm fill I) (i.e. without the linear constraints Bx = b) was proven 
in this more general context in [T7j . 

• The functional J-(x) = \\Kx — y\\ 2 + 2\\\Ax\\i, evaluated in the iterates 
x n , does not decrease monotonically as a function of n. Because the 
iterates x n do not necessarily satisfy the constraint in every step, it is 
even possible that J-"(x n ) < jF{x) for some n. The constraint Bx = b 
is only satisfied in the limit n — > oo. 

• If one wants to solve the £i-norm constrained problem 

x = arg min \\Kx — y\\ 2 (31) 

Bx=6,||x||i<.R 



w n+l 



X 



n+1 
■n+1 



instead of the ^i-norm penalized problem (1161) . then one may replace 
the soft-thresholding Sa in algorithm (|15p by projection on the £i-ball. 
The algorithm is: 

w n _ ^ Bx n _ ft j 

Q R (x n + K T (y - Kx n ) + B T w n+l ) (32) 
w n - ±(Bx n+1 - b), 

where Qr is the projection on the £i-ball of radius R (such a projection 
is explicitly doable by computer; see expression f )28|) . with A replaced by 
R, and the paragraph above). This algorithm converges for \\^K T K + 
B T B\\ < 1 and a > 1/2. This can be shown by using Lemma [T] (for 
an ^x-ball instead of an ^-ball) and proceeding in the same way as in 
Theorem [2] (without proof). 



w 



n+1 



X 



n+1 



W 



n+1 
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• In the special case when A = Id, the algorithm (|15p could also have 
been obtained from algorithm (A ), formula (3.2) of [23] (it wasn't done 
explicitly). This can be achieved by the choice H — > ||| Kx — y\\ 2 , A — > 
B, J ^ ||- ||i and Q Id — K T K — B T B and C a in [23] . However, 
the assumption in [23] that Qo is positive definite (necessary to prove 
the convergence of the algorithm), amounts to having \\K T K+B T B\\ < 
1. We have shown here that the condition ||iif T .K" + B T B\\ < 1 is 
already sufficient to guarantee convergence. 

As already mentioned, when A = Id and K = the proposed algorithm 
(HOP reduces to the algorithm (jl8p for the problem (TTTj) . This algorithm 
was also (re) derived in [23] under a slightly different form (see equation 
(5.6) of [23]). 



5 Application to magneto-encephalography 

The goal of magneto-encephalography (MEG) is to determine a current den- 
sity J in the brain by measuring (a component of) the magnetic field B 
induced by J, in several points outside the scalp. We assume that B and J 
are linked by the Biot-Savart law: 

m = p f J(f') x -^- 3 dV, (33) 
4vr J v | r _ r /|° 

with /io = 47r x 10~ 7 Vs/Am and V the volume in which the current flows. 
The conservation of charges implies that div( J) = 0. For more information, 
see [TT] and references therein. 

In this section, we pose and solve a synthetic inverse problem inspired by 
this problem. We consider a thin spherical shell V centered at the origin, 
with an outer radius of 9cm and a thickness of 1mm. Measurements are made 
at 500 random points r*j (i : 1 . . . 500) uniformly distributed on the upper 
hemisphere with a distance of 10cm to the origin. The data is composed of the 
radial component of the magnetic field B r {f) measured in these 500 points. 
For simplicity all the current densities considered below do not depend on 
the radius and do not have a radial component. In other words, we treat a 
2D problem. 

In order to discretize the problem, we use the "cubed sphere" parametriza- 
tion introduced in [19]; it maps the sphere to the six sides of a cube and a 
regular grid is then used on each of these six sides. We choose a grid with 
64 2 voxels on each side. 

As input model (that we will want to reconstruct) we choose the current 
density distribution J in shown in Figure [TJ This model is divergence-free 
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Figure 1: Input model J m on the sphere (left) and its top view (right). The 
background color map is proportional to the local norm of the current density 
vector J in . 



by construction. The matrix K encodes the Biot-Savart law for the radial 
component B r (fi) = B ■ e r (fi) of B in the 500 measurement points fi (i : 
1...500). As 



xe r (n)l -J(f*)dV', 



/ / Ti-f 1 



47T Jy \ \ fi — PI 3 



(34) 



we set: 

{^KJ^i ^ voxel ' «-^voxeh -^Q, voxel ~j / 7"^ ~*7i"3 ^ dV^ 

all voxels ^ ~ r 1 

(35) 

The data y is obtained from the synthetic input model J m . More precisely, 
the data is constructed by setting y = KJ m + e, where e is Gaussian noise. 
We choose a noise level of 10%: ||e|| = 0.1 x ||KJ in ||. 

Our goal now is to reconstruct the current density on each of the voxels of 
the cubed sphere from the noisy data y. The cubed sphere parametrization 
allows us to use a simple set of wavelet functions on the sphere introduced 
in [21]. They belong to the CDF 4-2 family [7]. As the current density has 
two components, we have to reconstruct 64 2 x 6 x 2 = 49152 coefficients 
from merely 500 measurements. The problem is therefore severely under- 
determined. 

To impose the assumed sparsity of J in the wavelet basis, we will use 
an ^-penalized least squares functional, with the additional linear constraint 
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div(J) = 0. We set w equal to the list of coefficients of J in the wavelet 
basis. Each element Wk of the list w (and J) has two components Wk,i,Wk,2 
corresponding to the two angular directions on the cubed sphere. The wavelet 
transform that maps the wavelet coefficients w to the current density J is 
represented by the operator W~ l so that J = W^w. In fact W~ l works on 
both angular components separately. 

The minimization problem we want to solve is: 

w TCC = argmin H^W 1 ?/; — y\\ 2 + 2XH(w), (36) 

w 

where the reconstructed current density is J rec = W^w^c. We consider 4 
distinct cases: 

a. problem (136]) with H{w) = J2k \ w k,i\ + 1^,2 1 without the constraint 
divJ = 

b. problem (1361) with H(w) = |wfc.i| + l u 'fc,2| with the constraint div J = 


c. problem (|55|) with H{w) = ^ fc max{|t<; fc]1 |, |-U7 fci2 |} without the con- 
straint div J = 

d. problem fl36l) with H{w) = J2k max {\ w k,i\, \ w k,2\} with the constraint 
div J = 

(the sum over k is over all voxels). We want to remark that none of the four 
penalties considered here are rotationally invariant 

Another possibility to reconstruct J under the constraint divJ = 0, is 
to take J = curl(Gl r ), with G a scalar field and to reconstruct a sparse G. 
However, even if the reconstructed G has a small reconstruction error, the 
error on J can be much larger as a result of taking the curl. 

Problems ^aj) and (jb]) can be solved using algorithm (TTo'j) with S A given 
componentwise in expression ([2]). As there is no constraint involved in 
problem (jaj), (jaj) can be solved more efficiently with the accelerated soft- 
thresholding algorithm FISTA pQ. We use 2000 iterations. The problem (jb]) 
is solved with 20000 iterations of algorithm ffTol) . 

For method (jcj) and (Jd|), joint sparsity is assumed in the wavelet basis. We 
therefore simply replace the function S\ used in the algorithms for problems 
(jaj) and (jb]) by the nonlinear operator T\ defined in formula (128]) . In other 
words, we use respectively 2000 iterations of the FISTA algorithm for problem 
(jcj) and 20000 iterations of algorithm (fi~5j) for problem ([d]). 
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Figure 2: Input model J m and four different reconstructions J rcc . Each cur- 
rent density reconstruction is the minimizer of its own functional. They are 
numbered according to the list in Section [5j The color scale is the same as 
in Figure [TJ 

In each of the four methods, the penalty parameter A is chosen to fit the 
data to the noise level (|| KJ Tec — y\\ = ||e||). The results are displayed in 
Figure [2j It should be emphasized that the four reconstructions minimize 
different functionals (with or without additional constraint). The four recon- 
structions are therefore not identical, not even in the limit of infinitely many 
iterations (four different limits). 

When comparing the results, we first conclude that the FISTA algorithm 
converges faster to its fixed-point than that algorithm (1 15]) converges to its 
fixed-point. This is shown in Figure [3] for cases (|aj) and ([b]). This is to be 
expected as FISTA is an accelerated algorithm and f lT5]) is not. 

We also calculate the relative reconstruction error e rec = || J in — J rec || /|| J in || , 
in the four cases (|a])-(jd|). The results are displayed in Tabled! The methods 
that take the constraint into account perform somewhat better than the ones 
that do not take the constraint into account. Although the difference is quite 
small, it does not seem to be a fluctuation: the same result was obtained for 
other input models, in the same wavelet basis as well as in other bases. 

The values of ||divJ|| for the four reconstructions are also reported in 
Table [TJ As expected, the constraint is far better satisfied when using meth- 
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10" 



5000 10000 15000 20000 5000 10000 15000 20000 

n n 

Figure 3: Convergence speed of algorithm fjl5jl (solid) and of FISTA (dotted) 
to their respective limits as a function of the number of iterations for the 
problems (jaj) and (jb]). Left: Relative distances to the respective limits of the 
algorithms. Right: Difference of the functionals to the respective final values. 
In both cases the fixed point w of the iterations was obtained using 500000 
iterations in their respective algorithms. In case of constrained minimization, 
the functional value may go below the final limit value. This is due to the 
fact that the iterates do not satisfy the divergence-free constraint at every 
step. Again, the two algorithms solve a different minimization problem. 





m 


E 


El 


El 


^rec 


0.81 


0.75 


0.78 


0.71 


||divJ rcc || 


0.59 


3.9 ■ 10~ 6 


0.61 


8.3 ■ 10~ 6 


# nonzero in w rcc 


82 


9780 


194 


18108 



Table 1: Reconstruction data for the four simulations discussed in Section [51 
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ods (|b|) and (Jd} with algorithm (|15[) . than methods (jgj) and (jcj). One of the 
consequences is that the methods (jb]) and (Jd|) give better visual results, as 
can be observed in Figure [2j While all four methods localize the objects in 
J quite well, only those corresponding to divergence-free reconstructions are 
well structured. 

Finally, the divergence-free reconstructions are less sparse than the re- 
constructions that do not satisfy this constraint. For example when solving 
case (jaj) with the FISTA algorithm, only 82 of the 49152 coefficients (in the 
wavelet basis) are nonzero, while case ()b]) solved with algorithm (I15j) . gives 
9780 nonzero coefficients. This is due to the high number of linear constraints 
(one for every voxel). 
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